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Abstract.  This  paper  contains  a  new  convergence  analysis  for  the  Lewis  and  Torczon  GPS 
class  of  pattern  search  methods  for  linearly  constrained  optimization.  The  analysis  is  motivated 
by  the  desire  to  understand  the  behavior  of  the  algorithm  under  hypotheses  more  consistent  with 
properties  satisfied  in  practice  for  a  class  of  problems,  discussed  at  various  points  in  the  paper,  for 
which  these  methods  are  successful.  Specifically,  even  if  the  objective  function  is  discontinuous  or 
extended  valued,  the  methods  find  a  limit  point  with  some  minimizing  properties.  Simple  examples 
show  that  the  strength  of  the  optimality  conditions  at  a  limit  point  does  not  depend  only  on  the 
algorithm,  but  also  on  the  directions  it  uses,  and  on  the  smoothness  of  the  objective  at  the  limit 
point  in  question.  This  contribution  of  this  paper  is  to  provide  a  simple  convergence  analysis  that 
supplies  detail  about  the  relation  of  optimality  conditions  to  objective  smoothness  properties,  and 
the  defining  directions  for  the  algorithm,  and  it  gives  older  results  as  easy  corollaries. 
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1.  Introduction.  Generalized  pattern  search  (GPS)  algorithms  were  defined 
and  analyzed  by  Torczon  [251  for  derivative-free  unconstrained  optimization  on  con¬ 
tinuously  differentiable  functions  using  positive  spanning  directions  [24j.  Lewis  and 
Torczon  showed  that  if  the  objective  is  continuously  differentiable  and  if  the  set  of 
directions  that  define  the  local  search  is  chosen  properly,  then  the  GPS  framework 
and  convergence  theory  extends  to  bound  constrained  optimization  [23]  and  more 
generally  for  problems  with  a  finite  number  of  linear  constraints  l2r)l  by  the  appealing 
“barrier”  strategy  of  declaring  any  infeasible  point  to  be  unacceptable  as  a  next  iter¬ 
ate.  Our  purpose  here  is  to  provide  a  new  simpler  unified  analysis  for  the  methods  in 
[251  ESI  E25] ,  and  to  help  elucidate  the  relationship  between  the  algorithm,  the  search 
directions,  and  the  local  smoothness  properties  of  the  objective  at  certain  specified 
limit  points  of  the  algorithm. 

The  optimization  problem  considered  in  this  paper  is: 

(1.1)  min /(a;)  ,  where  /  :  — >  3?  U  {oo}  . 

x£Q 

We  assume  as  in  [25]  that  12  =  {x  £  3?”  :  £  <  Ax  <  it}  where  A  £  Qmxn  js  a  rational 
matrix,  £,u  £  {SR  U  {±oo}}m  and  £  <  u.  The  way  of  handling  the  linear  constraints 
here,  and  indeed  the  entire  algorithm,  is  the  same  as  in  [23]  and  [25] :  but  a  key  part 
of  the  analysis  here  is  more  general  and  much  shorter. 

We  believe  that  the  primary  niche  of  GPS  methods  within  nonlinear  optimization 
stems  from  their  effectiveness  when  used  with  surrogates  for  what  are  generally 
expensive  objective  function  evaluations.  Certainly  our  interest  in  them  is  based  on 

*  Work  of  the  first  author  was  supported  by  NSERC  (Natural  Sciences  and  Engineering  Re¬ 
search  Council)  fellowship  PDF-207432-1998  during  a  post-doctoral  stay  at  Rice  University,  and 
both  authors  were  supported  by  DOE  DE-FG03-95ER25257,  AFOSR  F49620-01- 1-0013,  The  Boe¬ 
ing  Company,  Sandia  LG-4253,  ExxonMobil,  and  the  LANL  Computer  Science  Institute  (LACSI) 
contract  03891-99-23. 

tDepartement  de  Mathematiques  et  de  Genie  Industriel,  Ecole  Polytechnique  de  Montreal,  C.P. 
6079,  Succ.  Centre- ville,  Montreal  (Quebec),  H3C  3A7  Canada  (charlesa@gerad.ca) 

$  Computational  and  Applied  Mathematics  Department,  Rice  University  -  MS  134,  6100  Main 
Street,  Houston,  Texas,  77005-1892  (dennis@caam.rice.edu) 


1 


Report  Documentation  Page 

Form  Approved 

OMB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 

1.  REPORT  DATE 

2QQ£  2.  REPORT  TYPE 

3.  DATES  COVERED 

00-00-2006  to  00-00-2006 

4.  TITLE  AND  SUBTITLE 

Analysis  of  Generalized  Pattern  Searches 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Computational  and  Applied  Mathematics  Department  ,Rice 

University, 6100  Main  Street  MS  134, Houston, TX, 77005-1892 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

The  original  document  contains  color  images. 

14.  ABSTRACT 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF:  17.  LIMITATION  OF 

18.  NUMBER  19a.  NAME  OF 

a.  REPORT  b.  ABSTRACT  c.  THIS  PAGE 

unclassified  unclassified  unclassified 

15 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


success  using  this  approach  on  some  interesting  engineering  design  problems.  This 
motivation  influences  the  way  we  like  to  view  the  methods,  as  well  as  what  we  are 
willing  to  assume  about  the  objective  function  /  to  do  the  analysis.  We  will  give 
contextual  discussions  of  several  ways  in  which  GPS  methods  fill  this  niche  -  the  first 
just  below. 

For  many  applied  problems,  a  call  to  the  subroutine  that  evaluates  f(x)  may 
result  unexpectedly  in  no  value  being  returned,  which  we  model  as  f(x)  =  oo.  This 
important  issue  is  discussed  in  detail  in  [5] ,  where  GPS  is  effective  on  a  helicopter  ro¬ 
tor  design  example  for  which  no  value  is  returned  roughly  66%  of  the  time.  The  issue 
is  discussed  in  a  different  algorithmic  and  application  context  in  HUE].  The  point  is 
that  because  this  happens  in  many  applications,  we  are  precluded  from  making  global 
smoothness  assumptions,  including  even  continuity.  We  are  not  the  first  to  observe 
that  GPS  can  work  well  on  nonsmooth  problems.  Hough,  Kolda  and  Torczon  note  in 
an  earlier  version  of  [20]  that  “while  the  theory  for  pattern  search  assumes  that  /  is 
continuously  differentiable,  pattern  search  methods  can  be  effective  on  nondifferen- 
tiable  (and  even  discontinuous)  problems  precisely  because  they  do  not  explicitly  rely 
on  derivative  information  to  drive  the  search.” 

We  view  the  barrier  approach  as  applying  the  algorithm  not  to  /,  but  to  the 
barrier  function  /n  =  f  +  where  i/jq  is  the  indicator  function  for  H.  It  is  zero  on  H 
and  oo  elsewhere.  Clearly  then,  we  do  not  evaluate  f(x)  if  x  is  infeasible  because  we 
know  that  its  value  is  immaterial  since  the  algorithm  works  with  /n,  and  the  value  of 
/o  is  +00  on  all  points  that  are  either  infeasible  or  at  which  /  is  declared  to  be  +00: 


fn{x) 


f(x)  if  x  £  n 
00  else. 


The  reason  that  we  treat  together  all  the  methods  in  [28, 12.'jl  j25j  that  use  the  barrier 
approach  is  that  by  viewing  them  as  the  same  algorithm  applied  to  fa,  we  can  treat 
them  by  corollaries  of  a  single  result,  Theorem  |3.7|  that  allows  for  extended  values 
and  other  nonsmooth  behavior.  Our  approach  is  first  to  identify  a  class  of  promising 
limit  points  produced  by  GPS  applied  to  extended-valued  discontinuous  functions 
like  /q.  If  /  is  lower  semicontinuous  at  such  a  limit  point,  we  can  make  a  weak 
optimality  statement.  Then  we  apply  the  Clarke  calculus  [5]  locally  to  /  at  such  a 
point  to  relate  progressively  stronger  optimality  conditions  to  progressively  stronger 
local  smoothness  assumptions  at  the  limit  point. 

Thus,  the  structure  of  our  results  will  be  that  at  some  limit  point  whose  existence 
is  asserted  independent  of  certain  assumptions,  we  make  those  additional  assumptions 
to  draw  stronger  conclusions.  This  is  standard  for  Newton  or  quasi-Newton  methods 
(EU,  e.g.,  Theorem  8.6  pg  216  or  virtually  all  of  [22]),  but  it  has  not  been  the  norm 
for  direct  search  methods. 

Specifically,  we  observe  without  assuming  any  smoothness  that  there  is  a  con¬ 
vergent  subsequence  of  the  sequence  {xk}  of  iterates  produced  by  the  algorithm. 
Obviously  if  {f{xk)}  is  bounded  below,  then  lim^  f(xk)  is  finite  since  the  sequence  is 
nonincreasing.  Thus,  if  /  is  lower  semicontinuous  at  any  limit  point  x  of  the  sequence 
of  iterates,  then  f(x)  <  liminffc  f(xk)  =  lim*,  fixk).  Our  analysis  is  of  interest  for  the 
heat  intercept  design  problem  we  give  in  m  where  /  is  not  continuous  at  one  of  the 
limit  points  generated,  but  a  plot  suggests  that  it  is  lower  semicontinuous.  In  a  case 
where  f(xk)  =  00,  we  believe  that  an  optimization  code  should  notify  the  user  that 
it  has  found  an  interesting  point  at  which  the  subroutine  that  evaluates  /  should  be 
carefully  examined  in  hopes  of  obtaining  a  value,  which  may  correspond  to  a  good 
design. 
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Again  without  any  smoothness  assumptions,  we  show  that  there  is  a  limit  point 
x  of  a  subsequence  of  { x &}  consisting  of  iterates  that  are  local  optimizers  of  f(x)  to  a 
progressively  finer  resolution  of  the  current  mesh  at  those  iterates  (a  formal  definition 
of  the  mesh  is  given  in  Section  [2j .  The  directional  tests  that  led  GPS  to  refine  the 
mesh  at  the  terms  of  the  subsequence  are  exactly  that  difference  quotients  for  the 
Clarke  generalized  directional  at  x  are  nonnegative.  If  the  Clarke  derivatives  exist  at 
x,  as  they  will  if  /  is  locally  Lipschitz  at  x,  then  these  nonnegative  difference  quotients 
pass  through  the  limit  to  be  nonnegative  Clarke  derivatives  in  the  directions  used. 

Nonnegative  directional  derivatives  in  a  set  of  directions  are  necessary  conditions 
for  optimality,  but  they  are  not  the  usual  first  order  conditions.  To  get  those,  we 
assume  in  addition  that  the  generalized  gradient  of  /  is  a  singleton.  This  extra 
smoothness  causes  the  above  directional  optimality  conditions  to  hold  for  all  directions 
in  the  positive  cone  of  those  directions,  and  this  together  with  the  right  choice  of 
directions  leads  to  the  familiar  first  order  optimality  conditions.  We  give  examples 
that  supplement  those  in  [I]  and  show  that  our  results  are  sharp  in  that  they  predict 
the  behavior  of  the  algorithm. 

We  believe  that  it  is  useful  to  understand  how  the  algorithm  behaves  in  such 
cases  because  there  will  generally  be  no  way  of  knowing  beforehand  whether  the 
“blackbox”  function  given  to  the  algorithm  is  at  all  smooth,  and  our  analysis  describe 
the  minimal  optimality  conditions  that  can  be  guaranteed.  We  obtain  as  immediate 
corollaries  earlier  results  that  assumed  global  continuous  differentiability. 

The  remainder  of  the  paper  is  organized  as  follows:  in  the  next  section,  we  will 
give  a  brief  description  of  the  GPS  algorithm  class.  We  adhere  to  a  slightly  different, 
but  equivalent  version  of  the  Lewis  and  Torczon  algorithm,  because  our  major  interest 
in  these  algorithms  is  for  problems  where  they  are  used  with  inexpensive  surrogates  for 
an  expensive  function.  To  see  how  easily  and  effectively  surrogates  can  be  incorporated 
into  this  version  of  GPS,  see  m-  In  Section  [3j  we  present  the  assumptions  together 
with  a  discussion  of  our  local  smoothness  conditions,  then  we  give  the  key  result,  some 
easy  corollaries  for  unconstrained  problems  together  with  a  discussion  of  these  results 
before  we  go  on  to  the  results  for  the  linear  constraints.  Section [4] is  devoted  to  some 
concluding  remarks. 

2.  Generalized  pattern  search  algorithms.  Generalized  pattern  search  algo¬ 
rithms  for  unconstrained  or  linearly  constrained  minimization  generate  a  sequence  of 
iterates  {a:*,}  in  5J”  with  non-increasing  objective  function  values.  Because  of  our  in¬ 
terest  in  surrogate-based  optimization,  we  like  to  view  each  iteration  as  being  divided 
into  two  phases:  an  optional  SEARCH  and  a  local  POLL,  defined  next. 

In  the  SEARCH  step,  the  barrier  objective  function  fa  is  evaluated  at  a  finite 
number  of  points  on  a  mesh  (a  discrete  subset  of  5ft"  defined  below  whose  fineness 
is  parameterized  by  the  mesh  size  parameter  A*,  >  0)  to  try  to  find  one  that  yields 
a  lower  objective  function  value  than  the  incumbent.  Any  strategy  may  be  used  to 
select  the  mesh  points  that  are  candidates  to  replace  the  incumbent,  as  long  as  only 
finitely  many  points  (including  none)  are  selected. 

This  is  a  key  point.  The  SEARCH  step  accommodates  whatever  heuristics  the 
user  was  already  using  to  attack  their  problem  using  surrogates.  One  might  do  some 
random  search  on  the  mesh  using  the  surrogate,  or,  as  in  the  Boeing  Design  Ex¬ 
plorer  software  [4],  one  might  apply  SQP  to  the  surrogate  problem  and  then  move 
the  solution  to  a  nearby  mesh  point  to  choose  the  candidates  at  which  to  evaluate  the 
expensive  objective  function  in  hopes  of  obtaining  a  better  next  iterate.  Coope  and 
Price  m  offer  a  possibility  for  a  related  framework  that  does  not  require  pushing  a 
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surrogate  solution  to  the  mesh  for  it  to  become  an  acceptable  trial  point.  It  would  be 
interesting  to  blend  the  analysis  here  with  their  related  methods. 

On  the  other  hand,  the  freedom  of  the  SEARCH  step  is  definitely  a  theoretical 
liability.  In  [T]  and  here,  there  are  examples  of  nonempty  searches  that  spoil  chances 
for  the  algorithm  to  find  KKT  points  and  of  empty  searches  that  mire  the  algorithm  in 
at  a  poor  point  when  a  naive  random  selection  from  the  current  mesh  in  the  SEARCH 
would  generally  lead  to  success.  Regardless,  this  freedom  must  be  retained.  Indeed, 
for  the  Boeing  example  [3  E],  the  algorithm  with  surrogates  is  much  more  efficient 
than  Serafini’s  implementation  of  the  Dennis-Torczon  MDS/PDS  algorithm  |13j .  This 
is  not  to  disparage  the  MDS  algorithm,  which  is  very  robust  on  that  example. 

Below,  we  will  offer  terminology  consistent  with  Coope  and  Price  [T2]  to  replace 
the  usual  ’’successful/unsuccessful”  terminology  in  the  GPS  literature.  The  original 
terminology  was  adequate  until  it  was  recognized  that  the  ’’unsuccessful”  iterations 
were  the  important  ones  because  they  produce  mesh  local  optimizers ,  while  successful 
iterations  produce  only  improved  mesh  points ,  which  we  define  now. 

When  the  incumbent  is  replaced,  i.e.,  when  fn(xk+i)  <  fn(xk),  or  equivalently 
when  f(xk+ 1)  <  f(xk),  then  Xk+i  is  said  to  be  an  improved  mesh  point.  When  the 
SEARCH  step  fails  in  providing  an  improved  mesh  point,  the  poll  step  is  invoked.  This 
second  step  consists  of  evaluating  the  barrier  objective  function  at  the  neighboring 
mesh  points  to  see  if  a  lower  function  value  can  be  found  there.  A  crucial  practical 
feature  supported  by  the  theory  here,  but  originally  in  Torczon  [28],  is  that  as  soon 
as  an  improved  mesh  point  is  found,  polling  can  stop  immediately. 

When  the  POLL  step  fails  in  providing  an  improved  mesh  point,  then  the  current 
incumbent  solution  is  said  to  be  a  mesh  local  optimizer  (i.e.,  its  objective  function 
value  is  less  than  or  equal  to  that  of  neighboring  mesh  points).  The  algorithm  then 
refines  the  mesh  by  setting  the  mesh  size  parameter 

(2.1)  Afe+1  =  rWk  Afc 

for  0  <  rWk  <  1,  where  r  >  1  is  a  rational  number  that  remains  constant  over  all 
iterations,  and  Wk  <  —1  is  an  integer  bounded  below  by  the  constant  w~  <  — 1. 

If  either  the  SEARCH  or  poll  step  produces  an  improved  mesh  point,  then  the 
new  point  Xk+i  7^  Xk  has  a  strictly  lower  objective  function  value  (there  is  no  suffi¬ 
cient  decrease  condition,  another  crucial  practical  feature  supported  by  the  theory  in 
Torczon  m)  and  here,  the  mesh  size  parameter  is  kept  the  same  or  is  increased  to 
carry  out  far  reaching  and  inexpensive  (if  surrogates  are  used)  SEARCH  steps,  and  the 
process  is  reiterated.  The  coarsening  of  the  mesh  follows  the  rule 

(2.2)  Afe+1  =  rWk  Afc 

where  r  >  1  is  defined  above  and  Wk  >  0  is  an  integer  bounded  above  by  w+  >  0.  Our 
experience  with  surrogate-based  SEARCH  steps  0,  0  is  that  a  great  deal  of  progress 
can  be  made  with  few  function  values,  and  at  least  n  +  1  function  evaluations  are 
needed  only  to  show  local  mesh  optimality,  which  indicates  that  the  mesh  needs  to 
be  refined  (see  [21]  for  defining  a  minimal  number  of  polling  directions) . 

By  modifying  the  mesh  size  parameters  as  above,  it  follows  that  for  any  k  >  0, 
there  exists  an  integer  €  Z  such  that 

(2.3)  Afc  =  rrfcA0. 

The  basic  ingredient  in  the  definition  of  the  mesh  is  a  set  of  positive  spanning 
directions  D  in  5ft”  (more  precisely,  nonnegative  linear  combinations  of  the  elements 
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Initialization: 

Let  Xq  be  such  that  fn(x0)  is  finite,  and  let  M0  be  the  mesh  on  5ft"  defined 
by  Aq  >  0,  and  let  D0  and  xq  be  given  (see  equation  ( 2.4 l) .  Set  the  iteration 
counter  k  to  0. 

Search  and  poll  step: 

Perform  the  SEARCH  and  possibly  the  poll  steps  (or  only  part  of  them) 
until  an  improved  mesh  point  xk+i  with  the  lowest  so  far  /n  value  is  found 
on  the  mesh  Mk  defined  by  equation  ( 2.4 1 . 

—  Optional  SEARCH:  Evaluate  fn  on  a  finite  subset  of  trial  points  on 
the  mesh  Mk  defined  by  equation  (2.4 1  (the  strategy  that  gives  the  set 
of  points  is  usually  provided  by  the  user;  it  must  be  finite  and  the  set 
can  be  empty). 

—  Local  POLL:  Evaluate  /q  on  the  poll  set  defined  in  equation  (2.5 1 . 
Parameter  update: 

If  the  SEARCH  or  the  poll  step  produced  an  improved  mesh  point,  i.e.,  a 
feasible  iterate  xk+i  £  M k  n  for  which  fn(xk+i)  <  fn(x.k),  then  update 
Afe+i  >  Ak  according  to  rule  (|2.2|). 

Otherwise,  fn{%k)  <  fn{%k  +  Afcd)  for  all  d  £  Dk  and  so  Xk  is  a  mesh  local 
optimizer.  Set  Xk+i  =  xk,  update  Afc+i  <  Ak  according  to  rule  (2.1). 
Increase  k  <—  k  +  1  and  go  back  to  the  SEARCH  and  poll  step. 


Fig.  2.1.  A  basic  GPS  algorithm 


of  the  set  D  span  5ft").  There  is  great  freedom  in  choosing  these  directions,  only 
the  following  additional  rule  needs  to  be  respected:  each  direction  dj  £  D  (for  j  = 
1,  2, . . . ,  |£)|)  is  the  product  Gzj  of  the  non-singular  generating  matrix  G  £  5ftnx” 
by  an  integer  vector  Zj  £  Zn .  Note  that  the  same  generating  matrix  is  used  for 
all  directions.  For  convenience,  the  set  D  is  also  viewed  as  a  real  n  x  \D\  matrix. 
Similarly,  we  denote  the  matrix  whose  columns  are  Zj,  for  j  =  1,  2, . . . ,  \D\  by  Z\  we 
can  therefore  write  D  =  GZ .  At  iteration  k,  the  mesh  is  centered  around  the  current 
iterate  Xk  £  5ft”  and  its  fineness  is  parameterized  through  the  mesh  size  parameter 
A  k  as  follows 

(2.4)  Mk  =  {xk  +  A kDz  :  2  £  zf1}, 

where  Z+  is  the  set  of  nonnegative  integers.  This  way  of  describing  the  mesh  differs 
from  [2H  [531  because  we  think  it  easier  to  understand  and  work  with. 

At  each  iteration,  some  positive  spanning  matrix  Dk  composed  of  columns  of  D 
is  used  to  construct  the  poll  set.  We  write  Dk  C  D  to  signify  that  the  matrix  Dk  is 
composed  of  columns  of  D.  The  poll  set  is  composed  of  mesh  points  neighboring  the 
current  iterate  xk  in  the  directions  of  the  columns  of  Dk'- 

(2.5)  Poll  set:  {xk  +  Akd  :  d  £  Dk}. 

Rules  for  selecting  Dk  may  depend  on  the  user’s  dynamic  intervention  during  the 
current  run,  or,  for  example,  on  the  iteration  number  or  the  current  iterate,  i.e., 
Dk  =  D{k,  Xk)  C  D. 

The  algorithm  is  stated  formally  in  Figure  [2T] 

The  SEARCH  strategy  is  the  key  to  effectiveness.  In  practice  it  allows  the  use  of 
heuristic  and  surrogate  methods  to  explore  the  domain  of  the  variables.  For  example, 
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one  might  apply  a  few  generations  of  a  genetic  algorithm  on  the  mesh  to  /q  or  to 
a  surrogate.  The  convergence  analysis  is  independent  of  the  SEARCH  step,  provided 
that  it  is  finite  and  returns  a  point  (or  points)  on  the  mesh.  The  poll  step  applied  to 
/o,  as  we  will  see,  guarantees  that  the  limit  point  provided  by  the  algorithm  satisfies 
optimality  conditions  whose  strength  depends  on  the  local  smoothness  of  /  at  the 
limit  point. 


3.  Convergence  analysis.  Theorem  |3.7|  is  our  main  result.  It  and  Theorem 
3.1  make  no  special  assumptions  about  the  crucial  relationship  between  the  directions 
D  and  the  feasible  region  Q.  This  means  that  they  apply  to  quite  general  uses  of 
GPS  (see  also  the  remark  following  Theorem  3.14),  but  without  a  connection  between 
f 1  and  D ,  the  resulting  constrained  optimality  conditions  are  weak  even  when  /  is 
smooth.  Its  immediate  corollary  (Theorem  |3.9|)  is  the  strongest  result  we  expect  for 
stationarity  in  the  unconstrained  case  (see  [T]  for  supporting  examples). 

Since  one  of  the  objectives  of  the  paper  is  to  simplify  the  convergence  analysis  of 
GPS,  we  include  the  proofs  of  all  the  results  leading  to  our  main  one,  even  if  some  of 
them  essentially  can  be  found  in  previous  work  modulo  the  slightly  different  way  of 
defining  the  mesh  (we  indicate  the  appropriate  references). 


3.1.  Assumptions  and  smoothness  requirements.  We  make  the  standard 
assumption  that  all  iterates  produced  by  GPS  lie  in  a  compact  set  (see  [2  |3l  Tfll  ITT1 
OGD  El,  01  13).  A  sufficient  condition  for  this  to  hold  is  that  the  level  set  L{x o)  = 
{igfl:  f(x)  <  f(x  o)}  is  compact.  We  cannot  assume  that  L(x0)  is  compact  because 
we  allow  discontinuities  and  even  f(x)  =  oo,  and  so  we  do  not  know  that  L{x o)  is 
closed.  However  we  can  assume  that  L(x o)  is  bounded  so  that  its  closure  is  compact. 

Whatever  we  assume  to  ensure  that  the  iterates  are  in  a  compact  set,  this  already 
implies  that  there  are  convergent  subsequences  of  the  iteration  sequence.  This  is 
enough  to  say  that  if  /  is  lower  semicontinuous  at  such  a  limit  point  x,  then  f(x)  < 
Km*,  f(xk)  for  the  entire  iteration  sequence.  Of  course,  /  can  be  infinite  arbitrarily 
near  a  point  where  it  is  lower  semicontinuous,  and  so  we  can  say  nothing  about  any 
derivatives  at  such  an  x.  For  that,  we  will  consider  an  interesting  set  of  subsequences 
identified  by  the  algorithm.  Specifically,  we  will  be  concerned  here,  as  in  [ziqiice! 
with  the  iterates  Xk  that  are  mesh  local  optimizers  for  meshes  that  get  infinitely  fine. 
We  will  use  x  to  denote  generic  limit  points  of  the  sequence  of  iterates,  and  x  for 
limit  points  of  mesh  local  optimizers  for  meshes  that  get  infinitely  fine.  It  is  only  at 
mesh  local  optimizers  that  A&  is  reduced.  This  is  not  to  say  that  other  subsequences 
may  not  exhibit  interesting  first  order  behavior,  but  we  can  prove  that  these  do, 
and  that  is  more  specific.  The  analysis  is  simpler  if  we  assume  that  the  mesh  size 
is  never  coarsened,  since  obviously  then  the  meshes  become  infinitely  fine  for  every 
sequence  of  mesh  local  optimizers.  However,  we  will  not  use  this  assumption  since 
mesh  coarsening  can  lead  more  rapidly  to  a  more  global  solution. 

To  summarize,  the  convergence  analysis  provided  below  relies  only  on  the  follow¬ 
ing  assumptions,  and  some  results  are  stated  in  terms  of  the  set  of  directions  D. 

Al:  A  function  /q  =  /  +  ^  fi  :  3?  — >  U  {+oo}  is  available. 

A2:  The  constraint  matrix  A  is  rational. 

A3:  All  iterates  {xk}  produced  by  the  algorithm  lie  in  a  compact  set. 


This  allows  us  to  prove  the  following  result  with  an  immediate,  but  rather  strange 
implication  -  stationary  points  are  the  least  interesting  limit  points  GPS  produces. 
Of  course,  if  all  the  limit  points  are  stationary  points,  then  all  are  equally  interesting. 
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Theorem  3.1.  Under  assumptions  Al  and  A3,  there  exists  at  least  one  limit 
point  of  the  iteration  sequence  {xk}-  If  f  is  lower  semicontinuous  at  such  a  limit 
point  x,  then  lim*,  f(xk)  exists  and  is  greater  than  or  equal  to  fix).  If  f  is  continuous 
at  every  limit  point  of  {xk},  then  every  limit  point  has  the  same  function  value. 

Proof.  Since  /  is  lower  semicontinuous  at  x,  we  know  that  for  any  subsequence 
{xk}keK  of  the  iteration  sequence  that  converges  to  x,  liminffcgig  flxu)  >  fix), 
which  is  finite.  But  since  the  subsequence  of  function  values  is  a  subsequence  of  a 
nonincreasing  sequence,  they  have  the  same  liminf.  Thus,  the  entire  sequence  is  also 
bounded  below  by  f(x),  and  so  it  converges.  ■ 

To  prove  more,  we  will  need  to  assume  more.  In  addition  to  A1-A3,  previous  work 
on  pattern  search  algorithms  assumes  continuous  differentiability  of  the  function  /  on 
a  neighborhood  of  the  level  set  L(x o)  =  {x  e  SI  :  f(x)  <  f(x o)}  ([2ll23ll25ll28llTTllT2]). 
In  the  unconstrained  case,  Torczon  [28]  shows  that  for  GPS  there  exists  a  limit  point 
x  satisfying  V/(x)  =  0,  and  our  [2]  shows  the  same  result  for  every  limit  point  x  of 
any  sequence  of  mesh  local  optimizers  for  which  lim*,  A*,  =  0.  Note  that  since  every 
limit  point  of  the  GPS  sequence  is  a  point  of  continuity  in  this  case,  nonstationary 
limit  points,  whose  possible  existence  is  shown  in  [lj,  are  very  interesting  because  with 
the  right  search  step,  or  the  right  choice  of  directions,  one  can  proceed  to  a  feasible 
point  with  a  better  value  of  /.  Our  analysis  below  uses  a  weaker  assumption  at  such 
a  limit  point  (strict  differentiabilitj|^|  of  /  at  x  instead  of  continuous  differentiability 
on  L{xq)). 

First  we  easily  show  (under  no  smoothness  assumptions)  the  existence  of  at  least 
one  limit  point  of  a  subsequence  of  mesh  local  optimizers  on  meshes  that  get  infinitely 
fine.  Then,  for  those  limit  points  where  /  is  strictly  differentiable,  we  show  that  the 
gradient  is  zero.  To  avoid  confusion  about  the  relative  strength  of  assuming  in  the 
context  of  GPS  that  /  is  locally  Lipschitz,  or  strictly  differentiable  at  a  point,  or 
continuously  differentiable,  we  will  provide  examples  following  Theorems  |3.7|  and  |3.9| 
for  which  those  results  apply  and  earlier  results  do  not.  The  original  proof  of  the  mesh 
refinement  results  were  first  given  in  [28]  with  a  different  description  of  the  meshes. 

We  now  proceed  with  some  results  on  the  behavior  of  the  mesh  and  mesh  size 
parameter.  These  results  do  not  depend  at  all  on  the  smoothness  of  /q;  they  use  just 
the  definition  of  the  algorithm  and  integrality  of  the  matrix  Z  used  to  construct  the  set 
of  directions  D.  For  a  different  framework,  Coope  and  Price  relax  the  conditions  on 
the  mesh  but  they  assume  that  the  meshes  become  infinitely  fine.  This  is  an  interesting 
tradeoff  that  puts  the  burden  for  ensuring  that  the  meshes  become  infinitely  fine  onto 
the  implementation,  but  allows  for  search  points  off  the  mesh  and  more  freedom  in 
the  definition  of  the  meshes. 


3.2.  Mesh  refinement.  The  main  result  of  this  section  is  that  there  is  a  sub¬ 
sequence  of  mesh  local  optimizers  for  which  the  mesh  size  parameter  goes  to  zero. 
The  first  lemma  shows  that  for  each  mesh  M*,,  the  minimal  distance  over  all  pairs  of 
distinct  mesh  points  is  bounded  below  by  the  mesh  size  parameter  A*,  times  a  scalar. 


1  The  function  /  is  said  to  be  strictly  differentiable  at  x  if  for  all  v, 
V f(x)Tv  (see  Clarke  [9]). 


lim 

y—>x,tl  0 


f{y  +  tv)  -  f{y) 
t 
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In  the  Euclidean  norm,  the  proof  involves  the  smallest  singular  value  of  G  [28] . 

Lemma  3.2.  For  any  integer  k  >  0,  and  any  norm  for  which  any  nonzero  integer 
vector  has  norm  at  least  1, 


min 

u^v£.Mk 


llu  ~  vll 


> 


Proof.  Using  equation  (2.4),  we  let  u  =  Xk  +  A kDzu  and  v  =  Xk  +  AkDzv  be  two 


distinct  points  on  Aik  with  both  zu  and  zv  in  zP.  Then 


~v\\  =  Ak\\D(zu  -  zv)\\  =  Afc|| GZ(zu-zv 


>  Afc- 


|| Z(zu  -  zv 

IG-LI 


> 


Aj. 


I  G~ 


The  last  part  of  the  inequality  is  due  to  the  fact  that  Z{zu  —  zv)  is  a  nonzero  integer 
vector,  thus  its  norm  is  greater  than  or  equal  to  one.  ■ 

The  previous  result  would  not  be  true  if  the  directions  of  D  were  not  constructed 
through  an  integral  matrix  Z.  For  example,  in  3?1  positive  integer  combinations  of 
the  columns  of  D  =  [—  l,+7r]  are  a  dense  subset  of  the  real  line.  Indeed,  there  are  no 
Z  =  [21,22]  with  21,22  £  Z 1  and  G  £  3J1  such  that  D  =  GZ. 

The  next  lemma  shows  that  the  mesh  size  parameters  generated  by  the  algorithm 
are  bounded  above  (it  is  similar  to  a  result  in  [2]  for  categorical  variables). 

Lemma  3.3.  There  exists  a  positive  integer  r+  such  that  Ak  <  Ao rr  for  any 
integer  k  >  0. 

Proof.  Using  assumption  A3,  we  let  A  be  a  compact  set  in  3?”  that  contains  all 
iterates,  and  denote  its  diameter  by  7  (i.e. ,  the  maximal  distance  between  two  of  its 
points).  If  Afc  >  7-  ||G_1||,  then  Lemma  3.2  with  ( v  =  Xff)  ensures  that  any  trial  point 


u  £  Aik  different  from  Xk  would  have  been  outside  of  X .  But  since  no  iterate  is  outside 
X ,  it  follows  that  at  any  iteration  whose  mesh  size  parameter  exceeded  7  •  ||G_1||,  the 
iterate  Xk  is  a  mesh  local  optimizer.  Thus  A*,  is  bounded  above  by  7-  ||G-1||t1"  and 
the  result  follows  by  setting  r+  large  enough  so  that  Ao rr  >  7  •  ||G_1||r"'  .  ■ 

The  proof  of  the  next  result  is  identical  in  spirit  to  that  of  the  same  result  in 
Torczon  [253  and  adapted  in  [2]  for  categorical  variables. 

Proposition  3.4.  The  mesh  size  parameters  satisfy  liminf  A^  =  0. 

k — >-+00 

Proof.  Suppose  by  way  of  contradiction  that  there  exists  a  negative  integer  p 


such  that  0  <  Aorp  <  Afc  for  all  k  >  0.  Combining  equation  (2.3 1  with  Lemma  3.3 
implies  that  for  any  k  >  0,  77,  takes  its  value  among  the  integers  of  the  finite  set 
{p,p  +  l,...,r+}. 

Since  Xk+i  £  Aik,  equation  (2.4 1  assures  that  £fc+i  =  Xk  +  AkDzk  for  some 
2fc  £  Z+.  Using  equation  (2.3 1  by  substituting  Afc  =  Ao rrfc  it  follows  that  for  any 
integer  N  >  1: 


xN  =  x0  + 


N-l 

E 

k= 1 


N-l 


AkDzk  =  x0  +  A0D  V"  rrkzk  =  x0  +  -^AqD 
z — /  qr 


k= 1 


N-l 

E 

*;= 1 


prk-Pqr'-rkzk 


where  p  and  q  are  relatively  prime  integers  satisfying  r  =  ^ .  Since  for  any  k  the  term 

prk-pqr+  -rk zk  appearing  in  this  last  sum  is  an  integer,  it  follows  that  all  iterates  lie 
on  the  translated  integer  lattice  generated  by  Xq  and  the  columns  of  A aD. 

Therefore,  since  all  iterates  belong  to  a  compact  set,  it  follows  that  there  are  only 
finitely  many  different  iterates,  and  thus  one  of  them  must  be  visited  infinitely  many 


times.  Therefore  the  rule  presented  in  equation  (2.2 1  is  only  applied  finitely  many 


times,  and  the  one  in  equation  (2.1 1  is  applied  infinitely  many  times.  This  contradicts 


the  hypothesis  that  Aqtp  is  a  lower  bound  for  the  mesh  size  parameter.  ■ 

3.3.  Main  convergence  result.  Since  the  mesh  size  parameter  shrinks  only 
when  a  mesh  local  optimizer  is  detected,  Proposition  |3.4|  guarantees  that  there  are 
infinitely  many  mesh  local  optimizers.  The  following  definition  specifies  the  subse¬ 
quences  we  use. 

Definition  3.5.  A  subsequence  of  the  GPS  iterates  consisting  of  mesh  local  op¬ 
timizers,  {xk}keK  (for  some  subset  of  indices  K ),  is  said  to  be  a  refining  subsequence 
if  {Ak}kGK  converges  to  zero. 

The  following  shows  the  existence  of  convergent  refining  subsequences.  Notice 


that  if  coarsening  of  the  mesh  was  not  allowed  (i.e.,  w+  is  set  at  0  in  equation  (2.2  l). 


then  every  subsequence  of  mesh  local  optimizers  would  be  a  refining  subsequence,  and 
so  the  next  result  would  be  trivial. 

Theorem  3.6.  There  exists  at  least  one  convergent  refining  subsequence. 

Proof.  Let  K"  be  the  set  of  indices  of  iterates  that  are  mesh  local  optimizers. 
Since  the  mesh  is  refined  only  at  iterations  when  a  local  mesh  optimizer  is  detected, 


Proposition  3.4  guarantees  that  there  exists  a  subset  of  indices  K'  C  K"  for  which 
{Afcjfcgx  ].  0.  Assumption  A3  ensures  that  there  exists  a  subset  of  indices  I\  C  K' 
for  which  the  subsequence  of  iterates  {x.k}k£K  converges.  ■ 

We  show  below  that  the  limit  of  any  refining  subsequence  satisfies  first  order 
optimality  conditions  appropriate  to  the  local  smoothness  of  /.  It  is  shown  in  [I; 
that  even  for  a  continuously  differentiable  /,  the  entire  iteration  sequence  might  not 
converge.  There  may  even  be  infinitely  many  limit  points,  and  not  all  of  these  limit 
points  are  stationary  points. 

Next  is  our  basic,  but  key,  result  in  which  we  apply  Clarke’s  [9]  generalized  direc¬ 
tional  derivatives  in  a  very  straightforward  way  to  the  pattern  search  analysis.  The 
results  that  follow  specialize  this  result.  Clarke’s  derivative  at  x  in  the  direction  d  is 
defined  for  locally  Lipschitz  functions.  Loosely  speaking,  it  is  defined  to  be  the  limit 
superior  of  the  directional  derivatives  (in  the  direction  d)  of  sequences  converging  to 
x.  The  precise  definition  is  given  in  the  proof  (see  equation  (3.1 1). 

Theorem  3.7.  Under  assumptions  A1-A3,  if  x  is  any  limit  of  a  refining  sub¬ 
sequence,  and  if  d  is  any  direction  in  D  for  which  f  at  a  poll  step  was  evaluated 
for  infinitely  many  iterates  in  the  subsequence,  and  if  f  is  Lipschitz  near  x,  then 
the  generalized  directional  derivative  of  f  at  x  in  the  direction  d  is  nonnegative,  i.e., 

f°(x;  d)>  0. 

Proof.  Let  {xk}k&K  be  a  refining  subsequence  and  x  its  limit  point  obtained  as 
in  the  statement  of  the  Theorem.  Since  /  is  locally  Lipschitz  near  x,  we  have  from 
Clarke  j9]  by  definition  that: 


(3.1)  f°(x\  d)  =  limsup 

y—>x,  t|0 


f(y  +  td )  -  f(y) 


>  lim  sup 
keK 


f(xk  +  A kd)  -  f(xk) 


We  need  to  know  that  the  difference  quotients  are  defined.  First  note  that  since  /  is 
Lipschitz  near  x,  it  must  be  finite  near  x.  Note  also  that  since  a  main  point  of  the 
paper  is  to  allow  for  extended  valued  functions  and  to  justify  the  expedient  of  dealing 
with  constraints  by  declining  to  evaluate  the  function  /  at  infeasible  points,  we  made 
the  hypothesis  that  /  was  actually  evaluated  infinitely  many  times  in  the  direction  d. 
Therefore,  for  k  sufficiently  large  all  the  poll  steps  in  the  direction  d,  Xk  +  A  kd,  are 
feasible.  If  they  had  not  been,  then  /q  would  have  been  infinite  there  and  so  /  would 
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not  have  been  evaluated  (recall  that  if  x  fi,  then  fn(x)  is  set  at  +oo  and  f(x)  is 
not  evaluated). 


Thus,  we  have  that  infinitely  many  of  the  right  hand  quotients  of  (3.1 1  are  defined 


and  in  fact  they  are  the  same  as  for  fa.  This  allows  us  to  conclude  that  all  of  them 
must  be  nonnegative  or  else  the  corresponding  poll  step  would  have  been  successful  in 
identifying  an  improved  mesh  point  (recall  that  refining  subsequences  are  constructed 
from  mesh  local  optimizers).  ■ 

In  the  unconstrained  case,  there  will  always  be  a  positive  spanning  set  of  directions 
that  satisfy  the  hypotheses  of  the  previous  theorem.  In  the  constrained  case,  there 
may  be  no  such  d  if  D  were  defined  in  a  way  incompatible  with  the  geometry  of  the 
constraints  (see  the  example  in  [23]).  Thus  in  the  next  section,  we  will  appeal  to  the 
construction  in  |25j  to  ensure  that  a  sufficiently  rich  set  of  directions  is  used  for  bound 
or  linear  constraints.  Again,  we  emphasize  that  GPS  is  a  directional  method,  and  the 
choice  of  directions  is  crucial. 

The  following  example  illustrates  Theorem  |3.7|  on  a  Lipschitz  function.  This 
function  looks  like  a  convex  function  (quadratic  in  fact)  that  has  been  contaminated 
by  local  noise  that  decreases  in  amplitude  near  the  minimizer.  This  behavior  is 
common  enough  in  practice  to  be  the  target  class  for  implicit  filtering  algorithms  (18j . 

Example  3.8.  Consider  the  function  f  :  3?  — >  3?  defined  as  f(x)  =  a;2(2+sin(^)). 
This  function  possesses  infinitely  many  local  optima  near  0.  One  can  show  that  f  is 
Lipschitz  near  0,  but  it  is  not  strictly  differentiable  there,  and  so  certainly  it  is  not 
continuously  differentiable.  In  fact,  the  generalized  gradient  satisfies  df( 0)  =  [ — 7r,  7t]  . 

If  the  GPS  algorithm  with  empty  SEARCH  steps,  Xo  =  Ao  =  1,  D  =  {—1, 1}, 
Afc_j_i  =  At  when  an  improved  mesh  point  is  found,  and  A^.+i  =  |A k  when  a  mesh 
local  optimizer  is  detected,  is  applied  to  this  problem,  then  the  sequence  of  iterates 


{a^fc}  converges  to  0,  where  f°{ 0;  ±1)  =  7r  >  0  as  Theorem  3.7  guarantees.  The  proof 
of  this  claim  can  be  seen  from  Table  \3.1\ 


Table  3.1 

In  four  consecutive  iterations,  the  iterates  go  from  x k  =  f .  Ak 
integer  to  xk+4  =  Ak+4  =  . 


—  where  a  is  a  positive 


k 

xk  f(xk) 

Afc 

f(xk-  Afc)  f{xk  +  Afc) 

Iteration  status 

4  i 

4i  +  l 

4i  +  2 

4i  +  3 

4  (j  +  1) 

1  2 

C*  Ot2 

1  2 

CX.  Ot2 

-1  1 

2ot  2c*2 

-1  1 

2  a  2a 2 

1  1 

4c*  8a2 

3 

a 

3 

2a 

3 

2a 

3 

4a 

3 

4  a 

/(— )  >  4-  1 

J  v  a  '  —  a  \  a  '  —  a 

1  f/2+3\  >  25 

3  v  2 a  )  2a2  '  2c*  /  —  4c*2 

>  4  f( -l+3\  _  2 

J  '  2c*  '  —  a2  J  ^  2a  '  a2 

f(-2-3\^  25  f(-  2+3  \  1 

J  \  4c*  '  —  16c*2  ^  ^  4c*  '  8a2 

mesh  local  optimizer 
improved  mesh  point 
mesh  local  optimizer 
improved  mesh  point 

Theorem |3.7| is  the  key  to  our  analysis.  The  fact  that  its  proof  follows  so  directly 
from  Clarke’s  definition  of  the  generalized  directional  derivative  is  because  unsuccess¬ 
ful  polling  at  mesh  local  optimizers  belonging  to  convergent  refining  sequences  pro¬ 
vide  exactly  the  nonnegative  difference  quotients  that  Clarke’s  derivatives  need  since 
Xk  — >  x  and  Ak  J,  0.  We  believe  that  this  illustrates  an  intimate  relationship  between 
Clarke’s  generalized  directional  derivatives  and  the  directional  algorithm  GPS. 

3.4.  Corollaries  for  unconstrained  optimization.  Before  we  add  the  com¬ 
plication  of  choosing  directions  for  linear  constraints,  we  give  some  easy  corollaries  of 
Theorem  13.71  for  the  unconstrained  case. 
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In  addition  to  the  assumption  that  /  is  Lipschitz  near  x,  we  assume  that  the 
generalized  gradient  of  /  at  x  is  a  singleton.  This  is  equivalent  to  assuming  that 
/  is  strictly  differentiable  at  x,  i.e.,  that  there  exists  a  Dsf(x)  £  3 ffra  such  that 
lim  fiv+tw)-J(v )  _  £)sf(x)Tw  for  all  w  £  3Jn  (see  [2],  Proposition  2.2.1  or  Propo- 

y^x,tlO 

sition  2.2.4).  Since  the  generalized  gradient  is  a  singleton  df(x)  =  {Dsf(x)},  we  use 
the  standard  notation  for  the  gradient  V/(: r)  =  Dsf(x). 

Theorem  3.9.  Under  assumptions  A1  and  A3,  let  Cl  =  3?"  and  x  be  any  limit  of 
a  refining  subsequence.  If  f  is  strictly  differentiable  at  x,  then  V/(x)  =  0. 

Proof.  Again  from  [5],  if  /  is  strictly  differentiable  at  x,  then  for  any  direction 
w  0,  /°(x;  w)  =  V  f(x)Tw.  Now  let  D  be  any  positive  spanning  set  that  is  used 
infinitely  many  times  in  the  refining  subsequence,  there  must  be  at  least  one  since  D 


is  finite.  Then  by  Theorem  3.7  for  each  d  £  D,  0  <  V/(x)Td.  Thus,  if  we  write  w  as 


a  nonnegative  linear  combination  of  the  elements  of  D ,  then  we  see  immediately  that 
V f(x)Tw  >  0.  But  the  same  construction  for  —w  shows  that  —Vf(x)Tw  >  0  and  so 
V/(x)  =  0. 

The  following  example,  based  on  a  function  taken  from  m,  illustrates  the  ap¬ 
plicability  of  Theorem  |3.9|  by  showing  that  any  realization  of  GPS  converges  to  the 
global  minimizer  for  this  convex  function,  which  is  strictly  differentiable  at  its  min- 
imizer,  but  not  continuously  differentiable.  We  are  not  aware  of  any  other  results 
that  apply  to  this  example  (the  previous  GPS  analysis  cannot  be  applied  since  they 
assumed  global  continuous  differentiability). 

Example  3.10.  Consider  the  convex  function  /  :  3?  — >■  3?  defined  as  f[x)  = 
fQ  (p(u)du,  where 


<p{u) 


u  if  u  <  0 

TTk  lfK+l>^>K£Z+. 


The  function  f  is  Lipschitz  near  x  =  0.  It  is  shown  in  m  that,  f  has  kinks  at  ^  with 
df(^)  =  for  k  =  1,2,...  The  corollary  of  Proposition  2.2.  f  in  guarantees 

that  f  is  not  continuously  differentiable  near  x.  Furthermore,  df{ 0)  reduces  to  the 
singleton  {0},  and  the  same  Proposition  ensures  that  f  is  strictly  differentiable  at  x. 

Applying  Theorem  \3.I\  guarantees  that  any  instance  of  any  pattern  search  algo¬ 
rithm  with  any  set  of  initial  parameters  generates  a  subsequence  of  iterates  that  con¬ 
verges  to  the  global  minimizer  x  —  0  where  V/(x)  =  0,  since  the  function  is  locally 
Lipschitz  everywhere,  and  0  is  the  only  point  where  Clarke ’s  generalized  derivatives 
are  nonnegative  in  all  directions  of  a  positive  spanning  set. 

We  certainly  are  not  claiming  that  the  weaker  smoothness  conditions  we  use  imply 
that  GPS  methods  always  find  a  minimizer.  This  has  been  known  to  be  false  since 
the  inception  of  GPS  methods.  Simple  convex  counterexamples  come  from  starting 
at  just  the  wrong  point  and  choosing  just  the  right  ill-suited  directions.  This  can 
be  seen  by  considering  f(x)  =  |xi|  +  \x2\  on  3i2  and  starting  with  x0  =  (1,0)T  with 
D  =  {(1,  0)T,  (—1, 1)T,  (—1,  — 1)T}.  The  initial  point  Xq  is  a  mesh  local  optimizer  for 
every  A  >  0,  and  so  the  iteration  never  moves  from  Xq  with  an  empty  SEARCH  step. 

Unlike  the  corollaries  below  that  require  more  smoothness,  our  theorem  applies 
to  this  simple  example  and  describes  exactly  what  happens;  /  is  regular  at  x  and  the 
directional  derivatives  along  the  members  of  D  are  nonnegative. 

The  advantage  of  our  analysis  over  the  previous  ones  is  that  it  can  be  applied  to 
a  wider  class  of  problems,  and  that  it  says  what  actually  happens  when  the  algorithm 
is  applied  to  them. 
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The  following  two  corollaries  assume  continuous  differentiability.  We  have  dis¬ 
cussed  how  for  our  applications,  this  assumption  unlikely  to  be  satisfied,  except  per¬ 
haps  locally.  We  include  these  results  only  to  tie  our  results  here  to  earlier  results 
that  use  global  continuous  differtiabililty.  The  first  corollary  strengthens  our  result 
in  |2j.  It  shows  that  the  limit  of  the  gradient  for  any  refining  subsequence  converges 
to  zero,  even  if  the  subsequence  itself  does  not  converge. 

Corollary  3.11.  Let  f l  =  5ft"  and  f  be  continuously  differentiable  on  a  neigh¬ 
borhood  of  a  compact  set  containing  all  the  iterates  {xk}-  Then  for  any  refining 
subsequence  {xk}k&K,  0  =  lim k&KVf(xk). 

Proof.  We  have  assumed  A3,  A2  is  vacuous,  and  continuous  differentiability 
implies  assumption  Al.  If  x  is  any  limit  point  of  a  refining  subsequence,  then  con¬ 
tinuous  differentiability  implies  strict  differentiability  at  x  and  so  V f(x)  =  0  from 
Theorem  |3.9|  Since  the  continuous  image  of  a  compact  set  is  compact,  the  entire  se¬ 
quence  of  gradients  of  any  refining  subsequence  is  in  a  compact  set.  Thus,  there  must 
be  a  subsequence  {xk}k^K'  of  the  refining  subsequence  for  which  limkex'  V f(xk)  = 
limsup fcV/(xfc).  But  then  {xk}k^K'  has  a  convergent  subsequence,  and  its  limit 
point  has  a  zero  gradient  because  it  is  a  limit  point  of  a  refining  subsequence,  and  so 
0  =  limsup  fcV/(zfc).  ■ 

A  consequence  of  the  previous  result  is  that  under  the  assumption  that  /  is 
continuously  differentiable,  any  limit  point  of  a  refining  sequence  has  a  zero  gradient. 

The  fact  that  under  the  assumption  of  continuous  differentiability  the  limit  of  the 
gradients  of  any  refining  subsequence  is  zero  was  pointed  out  in  M-  Earlier,  under 
strong  restrictions  on  the  algorithm,  it  was  shown  in  J25]  that  0  =  lim^  V  f(xk).  One 
of  those  restrictions  is  that  limAfc  =  0,  which  we  proved  above  is  already  is  enough 
to  say  that  the  limit  of  the  gradients  at  the  mesh  local  optimizers  is  zero  since  then 
they  are  a  refining  subsequence.  Thus,  we  will  not  discuss  the  restrictions  needed  for 
the  stronger  result,  since  they  are  too  constraining  for  our  class  of  problems. 

The  next  corollary  (really  a  corollary  of  Corollary  3.111  is  Torczon’s  result  from 
(25],  strengthened  by  the  same  result  from  [HI- 

COROLLARY  3.12.  Let  f 1  =  3?n  and  f  be  continuously  differentiable  on  a  neigh¬ 
borhood  of  a  compact  set  containing  all  the  iterates  {xk},  then  some  limit  point  x  of 
{ccfc}  satisfies  V/(i)  =  0.  The  limit  of  the  gradients  for  any  refining  subsequence  is 
zero. 

Proof.  Every  refining  subsequence  is  a  subsequence  of  {a^}.  ■ 

In  summary,  if  assumptions  Al  and  A3  are  satisfied,  then  the  algorithm  guaran¬ 
tees  the  following  hierarchy  of  convergence  behavior. 

(i)  If  /  is  lower  semicontinuous  at  any  limit  point  x  of  the  GPS  iteration  sequence, 
then  Theorem  3.1  says  that  f{x)  <  lim*,  f(xk). 

(ii)  Every  limit  point  of  the  iteration  sequence  at  which  f  is  continuous  has  the 
same  function  value  lim^  f(xk)  whether  or  not  it  is  a  stationary  point.  Thus, 
if  GPS  produces  a  nonstationary  limit  point  jl],  which  must  necessarily  be 
a  limit  point  of  improved  mesh  points  (formerly  called  successful  iterations), 
then  there  is  a  descent  direction  from  that  limit  point,  and  so,  despite  finding 
a  stationary  point,  the  directions  were  poorly  suited  to  the  problem. 

(iii)  There  is  at  least  one  x  that  is  a  limit  point  of  a  refining  subsequence  i.e.,  x 
is  a  limit  point  of  a  sequence  of  local  optimizers  on  meshes  that  get  infinitely 
fine.  If  the  function  /  is  lower  semicontinuous  but  not  even  Lipschitz  near  x, 
then  nothing  additional  to  the  above  is  claimed  about  optimality  conditions 
satisfied  by  x. 
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(iv)  If  /  is  Lipschitz  near  x,  then  Theorem  |3.7|  holds  and  Clarke’s  generalized 
derivatives  satisfy  f°{x\  d)  >  0  for  some  directions  d  G  D  that  form  a  positive 
spanning  set.  In  addition,  f(x)  =  liin/;:  f{xk)  since  /  is  continuous  at  x. 

(v)  If  /  is  regulai^jat  x,  then  the  directional  derivatives  satisfy  f'{x\d)  >  0  for 
some  directions  d  G  D,  a  positive  spanning  set,  and  fix)  =  lim*,  f{xk)- 


(vi)  If  /  is  strictly  differentiable  at  x,  then  Theorem  3.9  holds  and  V/(x)  =  0, 


but  its  function  value  lim^  f{xk)  is  the  same  as  at  any  other  limit  point  of 
the  entire  GPS  iteration  sequence  at  which  /  is  continuous  (by  (ii)). 

(vii)  If  f  is  globally  continuously  differentiable  (as  assumed  in  earlier  analyses),  this 
means  that  every  limit  point  of  a  refining  subsequence  is  a  stationary  point 
as  in  item(vi)  and  that  the  gradients  of  a  refining  subsequence  converge  to 
zero,  whether  or  not  the  subsequence  converges.  However,  as  was  shown  in 
[Tj,  there  still  there  can  be  limit  points  of  the  entire  GPS  iteration  sequence 
that  are  not  stationary  points.  Though  such  points  have  the  same  function 
value  as  the  stationary  points,  there  is  a  descent  direction  from  such  points 
that  lead  to  lower  function  values. 

3.5.  Linearly  constrained  convergence  results.  In  this  section,  we  will  con¬ 
sider  only  the  case  where  fl  is  defined  through  a  finite  set  of  linear  constraints.  In 
order  to  prove  the  relevant  optimality  results,  we  will  have  to  assume  that  D,  even 
though  finite,  is  rich  enough  to  generate  poll  sets  that  conform  to  the  geometry  of  the 
boundary  of  fl.  Furthermore,  to  apply  our  proof  technique,  we  must  ensure  that  the 
spanning  sets  that  reflect  this  geometry  get  used  infinitely  many  times  as  we  converge 
to  a  point  on  the  boundary.  Lewis  and  Torczon  [251  show  how  to  use  standard  linear 
algebra  tools  to  generate  the  requisite  positive  spanning  matrices  D k  C  D.  This  relies 
on  assumption  A2,  the  rationality  of  the  constraint  matrix  A. 

We  pause  to  remind  the  reader  that  for  x  G  fl,  the  tangent  cone  to  fl  at  x  is 
Tq(x)  =  cl{/r(ui  —  x)  :  y  >  0,  w  G  fi}.  The  normal  cone  to  fl  at  x  is  Nq(x)  and  can  be 
written  as  the  polar  of  the  tangent  cone:  Nq(x)  =  {u  G  9tn  :  Vw  G  Tq(x),  vtw  <  0}. 
It  is  the  nonnegative  span  of  all  the  outwardly  pointing  constraint  normals  at  x. 

It  would  add  unnecessary  length  to  this  paper  to  rewrite  the  construction  given 
by  Lewis  and  Torczon  [25]  for  D  and  the  choice  rule  for  Df.  from  D  at  each  iteration 
(their  notation  for  Dk  is  Tfe).  The  construction  is  presented  there  quite  succinctly 
in  Section  8  of  [25]  where  they  consider  implementation  issues,  including  difficulties 
inherent  to  degenerate  constraints.  We  will  use  the  following  abstracted  version  of 
their  direction  choice. 

Definition  3.13.  A  rule  for  selecting  the  positive  spanning  sets  Dk  =  D{k ,  Xk)  C 
D  conforms  to  fl  for  some  e  >  0,  if  at  each  iteration  k  and  for  each  y  in  the  boundary 
of  fl  for  which  \\y  —  Xk\\  <  e,  Th(y)  is  generated  by  a  nonnegative  linear  combinations 
of  the  columns  of  a  subset  D\  of  Dk- 

With  this  definition,  we  are  ready  for  our  next  convergence  result.  Note  that  if 
Xk  G  fl  is  not  near  the  boundary,  then  Dk  need  only  provide  a  positive  spanning  set 
for  5ft™,  which  is  completely  sensible.  However,  in  our  experience,  it  is  best  not  to  take 
e  too  small  so  that  the  iterates  crowd  up  against  the  boundary  of  fl  and  the  mesh 
size  becomes  small.  This  is  mitigated  somewhat  by  allowing  variable  coarsening  of 


the  mesh  as  in  equation  (2.2 1. 


Theorem  3.14.  Under  assumptions  A1-A3,  if  f  is  strictly  differentiable  at  a  limit 


2  The  function  /  is  said  to  be  regular  at  x  if  for  all  v.  the  one-sided  directional  derivative  exists 
and  coincides  with  f°  (x:  v)  (see  Clarke  ED- 
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point  x  of  a  refining  subsequence,  and  if  the  rule  for  selecting  the  positive  spanning 
sets  Dk  =  D(k,Xk )  C  D  conforms  to  Cl  for  an  e  >  0,  then  C7f(x)Tw  >  0  for  all 
w  £  Tq,(x),  and  —  V/(£)  £  Nq(x).  Thus,  x  is  a  KKT  point. 

Proof.  If  x  is  interior  to  R,  then  the  result  is  just  Theorem  |3.9|  and  so  we  can 
proceed  directly  to  the  case  where  x  is  on  the  boundary  of  Cl. 

Suppose  that  the  rule  for  selecting  D ^  C  D  conforms  to  Cl  for  some  fixed  e  >  0, 
and  that  there  are  finitely  many  linear  constraints,  then  Df.  spans  Tq(x)  for  large 
k  £  K.  It  follows  that  there  can  only  be  finitely  many  different  such  sets  Df  for 
k  £  K.  Let  Dx  C  D  be  one  of  them  that  occur  infinitely  many  times. 

Theorem  3.7  implies  that  V f[x)T d  >  0  for  every  column  d  of  Dx.  But  since 
every  w  £  Tq(x)  is  a  nonnegative  linear  combination  of  the  columns  of  Dx,  then 
\7f(x)Tw  >  0.  To  complete  the  proof,  we  multiply  both  sides  by  —1  and  conclude 
that  —  V/(x)  is  in  Nq(x).  • 

Remark  3.15.  Iff  were  only  assumed  to  be  Lipschitz  near  x,  then  we  could  still 
conclude  as  in  Theorem  3.1.  that  f°(x]d)  >  0  for  every  column  d  of  Dx . 

The  following  corollary  is  Lewis  and  Torczon’s  result  from  [25]  which  relies  on  a 
stronger  differentiability  assumption. 

Corollary  3.16.  If  A2  and  A3  hold  and  f  is  continuously  differentiable  on 
a  neighborhood  of  a  compact  set  containing  all  the  iterates  {.t/j},  and  if  the  rule  for 
selecting  the  positive  spanning  sets  D &  =  D{k,Xk)  Q  D  conforms  to  Cl  for  an  e  >  0, 
then  there  exists  a  limit  point  x  of  { x &}  such  that  V  f(x)Tw  >  0  for  all  w  £  Tq(x), 
and  —  V/(£’)  £  Nq(x).  Thus,  x  is  a  KKT  point. 

Proof.  The  proof  follows  from  Theorem |3.14| since  every  refining  subsequence  is  a 
subsequence  of  {xk\  and  continuous  differentiability  implies  strict  differentiability.  ■ 


4.  Concluding  remarks.  This  paper  puts  together  ways  to  choose  the  direc¬ 
tions  and  results  on  properties  of  the  mesh  by  Lewis  and  Torczon,  some  observations 
of  ours  about  what  is  needed  to  obtain  convergence  of  those  algorithms  (such  as  refin¬ 
ing  subsequences),  and  elements  of  nonsmooth  analysis  set  forth  by  Clarke.  Clarke’s 
analysis  is  perfectly  suited  to  expose  the  first  order  optimality  conditions  at  limit 
points  of  certain  subsequences  of  the  GPS  iterates  under  weakened  assumptions  that 
correspond  to  some  real  problems  for  which  GPS  is  quite  effective. 

We  believe  that  our  analysis  helps  confirm  a  remark  of  [25]  that  GPS  methods 
for  general  constraints  will  not  be  based  on  the  appealingly  simple  barrier  strategy  of 
placing  a  high  function  value  on  infeasible  trial  points.  In  [3],  we  suggest  and  analyze 
a  GPS  algorithm  for  general  constraints  based  not  on  a  single  objective,  but  on  the 
interesting  new  filter  approach  of  Fletcher  et  al.  H5],  [IS]  and  m-  In  [26],  Lewis  and 
Torczon  give  a  successive  augmented  Lagrangian  pattern  search  approach  together 
with  its  convergence  analysis. 

Finally,  we  wish  to  acknowledge  a  helpful  referee  and  Major  Mark  Abramson 
USAF  for  many  insightful  comments  that  improved  the  presentation. 
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